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Abstract. We put forward a model of action-based randomization mechanisms to analyse 
quantitative information flow (qif) under generic leakage functions, and under possibly 
adaptive adversaries. This model subsumes many of the QIF models proposed so far. Our 
main contributions include the following: (1) we identify mild general conditions on the 
leakage function under which it is possible to derive general and significant results on adap¬ 
tive QIF; (2) we contrast the efficiency of adaptive and non-adaptive strategies, showing 
that the latter are as efficient as the former in terms of length up to an expansion factor 
bounded by the number of available actions; (3) we show that the maximum information 
leakage over strategies, given a finite time horizon, can be expressed in terms of a Bellman 
equation. This can be used to compute an optimal finite strategy recursively, by resorting 
to standard methods like backward induction. 


1. Introduction 

Quantitative Information Flow (qif) is a well-established approach to confidentiality ana¬ 
lysis: the basic idea is measuring how much information flows from sensitive to observable 
data, relying on tools from Information Theory [TU El [1211321 ttH SI EZl El E] • 

Two important issues that arise in qif are: what measure one should adopt to quantify 
the leakage of confidential data, and the relationship between adaptive and non adaptive 
adversaries. Concerning the first issue, a long standing debate in the QIF community con¬ 
cerns the relative merits of leakage functions based on Shannon entropy (see e.g. nasi) 
and min-entropy (see e.g. mm): other types of entropies are sometimes considered (see 
e.g. [26] •) As a matter of fact, analytical results for each of these types of leakage functions 
have been so far worked out in a non-uniform, ad hoc fashion. 
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Concerning the second issue, one sees that, with the notable exception of [26] which we 
discuss later on, the treatment of confidentiality in QIF has so far been almost exclusively 
confined to attackers that can only passively eavesdrop on the mechanism; or, at best, 
obtain answers in response to queries (or actions) submitted in a non-adaptive fashion 
[Tj. By passive, we mean here attackers that can eavesdrop on messages, but not interfere 
with the generation process of these messaged Clearly, there are situations where this 
model is not adequate. To mention but two: chosen plaintext/ciphertext attacks against 
cryptographic hardware or software; adaptive querying of databases whose records contain 
both sensitive and non-sensitive fields. 

In this paper, we tackle both issues outlined above. We: (a) put forward a general QIF 
model where the leakage function is built around a generic uncertainty measure] and, (b) 
derive several general results on the relationship between adaptive and non adaptive adver¬ 
saries in this model. In more detail, we assume that, based on a secret piece of information 
X G ff, the mechanism responds to a sequence of queries/actions 01 , 02 ,... (o, € Act), 
adaptively submitted by an adversary, thus producing a sequence of answers/observations 
Y G y*. Responses to individual queries are in general probabilistic, either because of the 
presence of noise or by system’s design. Moreover, the mechanism is stateless, thus answers 
are independent from one another. The adversary is assumed to know the distribution 
according to which X has been generated (the prior) and the input-output behaviour of 
the mechanism. An adaptive adversary can choose the next query based on past observa¬ 
tions, according to a predefined strategy. Once a strategy and a prior have been fixed, they 
together induce a probability space over sequences of observations. Observing a specific 
sequence provides the adversary with information that modifies his belief about X, possi¬ 
bly reducing his uncertainty. We measure information leakage as the average reduction in 
uncertainty. We work with a generic measure of uncertainty, U{-). Formally, [/(•) is just a 
real-valued function over the set of probability distributions on X, which represent possible 
beliefs of the adversary. Just two properties are assumed of U{-): concavity and conti¬ 
nuity. Note that leakage functions commonly employed in QIF, such as Shannon entropy, 
guessing entropy and error probability - the additive version of Smith’s min-entropy-based 
vulnerability [32| - do fall in this category. 

The other central theme of our study is the comparison between adaptive and the 
simpler non-adaptive strategies. All in all, our results indicate that, for even moderately 
powerful adversaries, there is no dramatic difference between the two, in terms of difficulty 
of analysis. A more precise account of our contributions follows. 

• We put forward a general model of adaptive QIF; we identify mild general conditions 
on the uncertainty function under which it is possible to derive general and substantial 
results on adaptive QIF. 

• We compare the difficulty of analyzing mechanisms under adaptive and non-adaptive 
adversaries. We first note that, for the class of mechanisms admitting a “succinct” syn¬ 
tactic description - e.g. devices specified by boolean formulae - the analysis problem is 
intractable (NP-hard), even if limited to very simple instances of the non-adaptive case. 
This essentially depends on the fact that such mechanisms can feature exponentially many 

^Passivity in this sense does not rule out attackers that can try secrets, based on some form of oracle. 
Active attackers are also considered in approaches to quantitative integrity a theme that will not be 
considered here. 
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actions in the syntactic size of the description. In the general case, we show that non- 
adaptive finite strategies are as efficient as adaptive ones, up to an expansion factor in 
their length bounded by the number of distinct actions available. Practically, this indi¬ 
cates that, for mechanisms described in explicit form (e.g. by tables, like a db) hence 
featuring an “affordable” number of actions available to the adversary, it may be sufficient 
to assess resistance of the mechanism against non-adaptive strategies. This is important, 
because simple analytical results are available for such strategies [7]. 

• We show that the maximum leakage over all strategies is the same for both adaptive 
and non-adaptive adversaries, and only depends on an indistinguishability equivalence 
relation over the set of secrets. 

• We show that maximum leakage over all strategies over a finite horizon can be expressed 
in terms of a Bellman equation. This equation can be used to compute optimal finite 
strategies recursively. As an example, we show how to do that using Markov Decision 
Processes (mdp’s) and backward induction. 

• We finally give a Bayesian decision-theoretic justification of our definition of uncertainty 
function. We argue that each such function arises as a measure of expected loss; and, 
vice-versa, that (under a mild condition) each measure of expected loss is in fact an 
uncertainty function in our sense. 


Structure of the Paper. Section [2] introduces the model. This is illustrated with a few 
examples in Section [3l The subsequent four sections 4, 5, 6, 7 discuss the results outlined 
in (2), (3), (4) and (5) above, respectively. Section [8] contains a few concluding remarks, 
discussion of related work and some directions for further research. Some technical material 
has been confined to three separate appendices. 

2. A MODEL OF ADAPTIVE QIF 

2.1. Randomization mechanisms, uncertainty, strategies. 

Definition 2 . 1 . An action-based randomization mechanisn^ is a 4-tuple 

S = (A, T, Act, {Ma : o E Act }), 

where (all sets finite and nonempty): A,T and Act are respectively the sets of secrets, 
observations and actions (or queries) and for each a E Act, Ma is a stochastic matrix of 
dimensions |A| x |T|- 

For each action a E Act, x E A and y & y, the element of row x and column y of 
Ala is denoted by pa{y\x). Note that for each x and a, row x of Ma defines a probability 
distribution over y, denoted by pa{-\x). A mechanism S is deterministic if each entry of 
each Ala is either 0 or 1. Note that to any deterministic mechanism there corresponds 
a function / ; A x Act y dehned by f{x,a) = y, where pa{y\x) = 1. Recall that a 
real function F defined over a convex subset C C M” is concave if, for each A E [0,1] and 
u,v & C, it holds true that F{Xu -b (1 — X)v) > XF{u) -b (1 — X)F{v). In the rest of the 

The term information hiding system is sometimes found in the literature to indicate randomization 
mechanisms. This term, however, is also used with a rather different technical meaning in the literature on 
watermarking; so we prefer to avoid it altogether here. 
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paper, we let V denote the set of all probability distributions on X; this can of course be 
seen as a convex subset of (the probability simplex.) 

Definition 2.2 (Uncertainty). A function U : P —>■ M is an uncertainty measure if it is 
concave and continuous over V. 

We postpone a full justification of the above definition to Section [71 For the time being, 
we can explain intuitively the role of concavity as follows. Suppose the secret is generated 
according to either a distribution p or to another distribution q, the choice depending on 
a coin toss, with head’s probability A. The coin toss introduces extra randomness in the 
generation process. Therefore, the overall uncertainty of the adversary about the secret, 
U (X ■ p + {1 — X) ■ q), should be no less than the average uncertainty of the two original 
generation processes considered separately, that is XU[p) + {1 — X)U[q). As a matter of fact, 
most uncertainty measures in QIF do satisfy concavity. Continuity is a technical requirement 
that comes into play only in Theorem 15. aPI . 

Example 2.3. The following entropy functions, and variations thereof, are often considered 
in the quantitative security literature as measures of the difficulty or effort necessary for a 
passive adversary to identify a secret X, where A is a random variable over X distributed 
according to a known distribution p{-). All of these functions are easily proven to be 
uncertainty measures in our sense: 

• Shannon entropy: H{p) = with OlogO = 0 and log in base 2; 

• Error probability entropy: E{p) = 1 — max^^xp{x)-, 

• Guessing entropy: G{p) = *' p{xi) with p{xi) > p{x 2 ) > ■ ■ ■ > p{xn)- 

Example 2.4. For a somewhat different example of uncertainty, suppose that ACM. 
Then each probability distribution over X corresponds to a real valued r.v., and we can 

set U{X) = var(A), where var(A) = E[(X — /u)^], with p = E[A], is the familiar variance. 
This makes intuitive sense, as the higher the variability of X, the higher the uncertainty 
about its value. Let us check that var(A) is concave and continuous. 

Indeed, continuity follows immediately from the definition. Concerning concavity, first 
note that, for any real z, E[{X — z)'^] = var(A) + {p — z)^ (easily checked by writing 
{X — z)^ as ((A — p) + (p — z))^, then expanding the square and then applying linearity of 
expectation.) This implies that E[{X — z)‘^] > var(A). Now, let p, q be any two distributions 
on X, let A € [0,1] and r = X ■ p + {1 — X) ■ q. Denoting by pu and var„, respectively, the 
expectation and variance of A taken according to a distribution n, we have the following. 

varfyA) = Er[{X-prf] 

= XEpiiX - Prf] + (1 - X)Eg[{X - Prf] 

> Avarp(A) + (1 — A)varq(A). 

We note that the min-entropy function, Hoo{p) = — logmaXj,p(x), is neither concave 
nor convex, so it does not fit in the present framework. However, one can at least indirectly 
express min-entropy via the error probability entropy E{-): Hao{p) = — log(l — E{p)). 

A strategy is a partial function a : y* ^ Act such that dom(fT) is non-empty and 
prefix-closeqj- A strategy is finite if dom((T) is finite. The length of a finite strategy 


^In fact, concavity does imply continuity except possibly on the frontier of V. 

set B C y* is prefix-closed if whenever a G B and a' is a prefix of a then a' G B. 
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is defined as max{|^| : ^ € dom((T)} + 1. For each n > 0 we will let y", re"', z”",... 
range over sequences in 3^"; given y” = (yi,...,y„) and 0 < j < n, we will let 
denote the first j components of y", (yi,...,yj). Given a strategy a and an integer 
n > 0, the truncation of a at level n, denoted as (T\n, is the finite strategy 
A finite strategy of length I is complete if dom((T) = 


© 


0 


© 


© 


Uo<i<z-i3^*. A strategy a is non-adaptive if whenever 
y” and w"' are two sequences of the same length then 
^(y^) = cr(u)"^) (that is, the decision of which action to 
play next only depends on the number of past actions); 
note that finite non-adaptive strategies are necessarily 
complete. We note that strategies can be described as 
trees, with nodes labelled by actions and arc labelled by 
observations, in the obvious way. Any non-adaptive stra¬ 
tegy also enjoys a simpler representation as a finite or infinite list of actions: we write 
a = [ai,... ,ai,...] if cr(y*“^) = a*, for i = 1,2,.... 


© © 


Figure 1. Two strategy trees. 


Example 2.5. Strategies a = [e a,y b] and a' = [e a, y b,y' c, yy' i-A- d] can 
be represented as in Figure [TJ Note that the tree’s height is one less than the strategy’s 
length. 


Remark 2.6. It is worthwhile to comment on two possible objections to our strategy 
model. First, we do not consider mixed strategies, that is strategies where the next action 
is chosen probabilistically, rather than deterministically like in our case. It is true that 
mixed strategies play a key role in Game Theory: equilibria typically arise in the form of 
profiles of mixed strategies, as in many games any pure (deterministic) strategy could be 
easily beaten by an opponent. We believe, however, that mixed strategies are irrelevant 
in the present context: there is only one player here (the adversary), and consequently no 
meaningful notion of equilibrium. In particular, there is no such role as a defender whose 
moves depend on the adversary’s ones. In game-theoretical terms, the adversary is playing 
against the Nature, represented by the mechanism. 

Another possible objection is that in our model the next action depends solely on the 
sequence of past observations, rather than on the whole history comprising also the past 
actions played by the adversary. This limitation can be easily overcome by considering a 
modified action-based mechanism, where actions are part of the observation: that is, in the 

new mechanism, the set of observations is Act x y, and one poses pa{{b,y)\x) = pa{y\x) if 

a = b and pa{{b,y)\x) = 0 otherwise, where Pa{-\-) is the a-stochastic matrix of the original 
mechanism. This way, strategies for the new mechanism automatically take into account 
the whole history. 


2.2. A probability space. Informally, we consider an adversary who repeatedly queries 
a mechanism, according to a predefined finite strategy. At some point, the strategy will 
terminate, and the adversary will have collected a sequence of observations y” = (yi,..., y„). 
Note that both the length n and the probability of the individual observations yi, hence of 
the whole y", will in general depend both on X and on the strategy played by the adversary. 
In other words, the distribution p{-) of X and the strategy a together induce a probability 
distribution on a subset of all observation sequences: the ones that may arise as a result of 
a complete interaction with the mechanism, according to the played strategy. 
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Formally, let p{-) be any given probability distribution over X, which we will often refer 
to as the prior. For each finite strategy a, we define a joint probability distribution Pai ) 

on X X y* , depending on a and on p{ ), as follows. We let Po-(x, e) = 0 and, for each j > 0: 


Pa{x,yi,...,yj,yj+i) = < 


p{x) ■PaAyi\x) 


Paj{yj\x)Pa^+Ayj+l\x) 

if y^ G dom((T), ^ dom((T) 

( 2 . 1 ) 


0 otherwise 

V 

where in the first case Oj = for i = 1,..., j + 1. In case a = [a], a single action 

strategy, we will often abbreviate P[a](') as Pa{-)- Note that the support of Pa{-) is finite, in 
particular supp(p(j) C A' x {y^y : j > 0, G dom(cr),y-^y ^ dom(cj)}. 

Let {X, y) be a pair of random variables with outcomes \n Xxy*, jointly distributed ac¬ 
cording to po-(-): here X represents the secret and Y represents the sequence of observations 
obtained upon termination of the strategy. We shall often use such shortened notations as: 
P(j{x\y'^) for Pr(X = x\Y = y""), Paiy^) for Pr(y = y"^), and so on. Explicit formulas for 
computing these quantities can be easily derived from the definition of Pa{') and using Bayes 
rule. We will normally keep the dependence of {X,Y) on p{-) and a implicit. When we 
want to stress that we are considering Y according to the distribution induced by a specific 
a (e.g. because different strategies are being considered at the same time), we will write it 
as Y„. 


Consider a prior y(-) and a finite strategy a, and the corresponding pair of random 
variables (r.v.) (X, y). We define the following quantities, expressing average uncertainty, 
conditional uncertainty and information gain about X, that may result from interaction 
according to strategy a (by convention, below we let y” range over sequences with Pa{y^) > 
0 ): 

UiX) = u{p) 

u^{x\Y) ^ ( 2 - 2 ) 

yn 

UX;Y) = U{X)-U^{X\Y). 

Again, we may drop the subscript a from Ua and 1^ if the strategy a is clear from the 
context. Note that, in the case of Shannon entropy, I(j{X\Y) coincides with the familiar 
mutual information, traditionally measured in bits. In the case of error entropy, Ia{X;Y) 
is what is called additive leakage in e.g. m and advantage in the cryptographic literature, 
see e.g. [20] and references therein. 

In the rest of the paper, unless otherwise stated, we let U (•) be an arbitrary uncertainty 
function. The following fact about I(j{X]Y) follows from f7(-)’s concavity and Jensen’s 
inequality, plus routine calculations on probability distributions (see Appendix). 


Lemma 2.7. I(j{X;Y) > 0. Moreover I„[X]Y) = 0 if X and Y are independent. 
Given the definitions in (j2.2|) . adaptive QIF can be defined quite simply. 
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Definition 2.8 (qif under adaptive adversaries). Let 5 be a mechanism and p{-) be a prior 
over X. 

(1) For a finite strategy cr, let Ia{S,p) = Ia{X;Y). 

(2) For an infinite strategy a, let 1^(3,p) = Ia\i{3,p). 

(3) {Maximum IF under p{-)) 1^,(3,p) = sup^ Ia{3,p). 

Note that I' > I implies /o-y/(<S,p) > Ii^\i{3,p), hence the limit in (2) always exists. 
Taking the distribution that achieves the maximum leakage, we can define an analog of 
channel capacity. 

Definition 2.9 (Adaptive secrecy capacity). C{3) = supp^-p F{3,p). 

2.3. Attack Trees. It is sometimes useful to work with a pictorial representation of the 
adversary’s attack steps, under a given strategy and prior. This can take the form of a tree, 
where each node represents an adversary’s belief about the secret, that is, a probability 
distribution over X. The tree describes the possible evolutions of the belief, depending on 
the strategy and on the observations. We formally introduce such a representation below: it 
will be extensively used in the examples. Note that attack trees are different from strategy 
trees. 

A history is a sequence h € {Act x T)*- Let h = (ai,yi,... ,an,yn) be such a history. 
Given a prior p{-), we define the update of p{-) after h, denoted by p^{-), as the distribution 
on X defined by 

p\x) = p.,{x\y^) (2.3) 

where ah = [oi,..., an], provided Puh{y'^) > 0; otherwise p^{-) is undefined. 

The attack tree induced by a strategy a and a prior p{-) is a tree with nodes labelled by 
probability distributions over X and arcs labelled with pairs {y, A) of an observation y and 
a probability value A. This tree is obtained from the strategy tree of a as follows. First, 
note that, in a strategy tree, each node can be identified with the unique history from the 
root leading to it. Given the strategy tree for a: (a) for each y y and each node missing 
an outgoing y-labelled arc, attach a new y-labelled arc leading to a new node; (b) label 
each node of the resulting tree by p^{-), where h is the history identifying the node, if p^{-) 
is defined, otherwise remove the node and its descendants, as well as the incoming arc; (c) 
label each arc from a node /i to a child, represented by a history h ■ a ■ y, in the resulting 
tree with A = Pa{y) - to be parsed as {p^)[a]{y)- This is the probability of observing y under 
a prior p^{-) when submitting action a. 

The concept of attack trees is demonstrated by a few examples in the next section. Here, 
we just note the following easy to check facts. For each leaf h of the attack tree: (i) the 
leaf’s label is p^{-) = Pa{'\y^), where y"' is the sequence of observations in h; (ii) if we let vr/i 
be the product of the probabilities on the edges from the root to the leaf, then tt/j = Pa{y^)- 
Moreover, (hi) each y'^ s.t. Pa{y"') > 0 is found in the tree. As a consequence, for a finite 
strategy, taking (12.2p into account, the uncertainty of X given Y can be computed from the 
attack tree as: 

= ^hU{p^). 

h is a. leaf 


Un{x I y) 


(2.4) 
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id 

ZIP 

Age 

Date 

Disease 

1 

^1 

65 

d2 

Heart disease 

2 

Zl 

65 

d2 

Flu 

3 

Zl 

67 

d2 

Short breath 

4 

Zl 

68 

di 

Obesity 

5 

Zl 

68 

di 

Heart disease 

6 

Z3 

66 

d2 

Heart disease 

7 

Z3 

67 

d2 

Obesity 

8 

Z3 

31 

d2 

Short breath 

9 

Z2 

30 

ds 

Heart disease 

10 

Z2 

31 

da 

Obesity 



Figure 3. Strategy tree of 
Example 13.11 


Figure 2. Medical db of Example 13.11 


3. Examples 

We present a few instances of the framework introduced in the previous section. We empha¬ 
size that these examples are quite simple and only serve to illustrate our main definitions. 
In the rest of the paper, we shall use the following notation: we let u{xi ,... ,Xk} denote 
the uniform distribution on {xi ,... ,Xk}- 

Example 3.1 (medical db). An attacker gets hold of the table shown in Figure [21 which 
represents a fragment of a hospital’s database. Each row of the table contains: a numerical 
id followed by the ZIP code, age, discharge date and disease of an individual that has been 
recently hospitalized. The table does not contain personal identihable information. The 
attacker gets to know that a certain target individual, John Doe (JD), has been recently 
hospitalized. However, the attacker is ignorant of the corresponding id in the table and any 
information about JD, apart from his name. The attacker’s task is to identify JD, i.e. to 
find JD’s id in the table, thus learning his disease. The attacker is in a position to ask a 
source, perhaps the hospital DB, queries concerning non sensitive information (ZIP code, 
age and discharge date) of any individual, including JD, and compare the answers with the 
table’s entriesll 

This situation can be modeled quite simply as an action-based mechanism <S, as follows. 
We pose: Act = {ZIP,Age,Date}; A = {I,..., 10}, the set of possible id’s, and y = 
TziP u TAge u TDate, where TziP = {zi,Z2,Z3}, TAge = {30,31,65,66,67,68} and J^Date = 
{di,d 2 ,d 3 }. The conditional probability matrices reflect the behaviour of the source when 
queried about ZIP code, age and discharge date of an individual. We assume that the source 
is truthful, hence answers will match the entries of the table. For example, pAge(2/|l) = 1 
if y = 65 and 0 otherwise;pzip(y|2) = 1 if y = zi, 0 otherwise; and so on. Note that this 
dehnes a deterministic mechanism. Finally, since the attacker has no clues about JD’s id, 
we set the prior to be the uniform distribution on A, p(-) = u{l,..., 10}. 

Assume now that, possibly to protect privacy of individuals, the number of queries to 
the source about any individual is limited to two. Figure |3| displays a possible attacker’s 
strategy a, of length 2. Figure ID displays the corresponding attack tree, under the given 
prior. Note that the given strategy is not in any sense optimal. Assume we set [/(•) = H{-), 
Shannon entropy, as a measure of uncertainty. Using (12.4p . we can compute Ia{S,p) = 

®That this is unsafe is of course well-known from database security: the present example only serves the 
purpose of illustration. 
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Figure 4. The 
attack tree for 


Example 13.11 



(u{l,. 

.,10}) 


1 

5 

(u{9 

10}) 

ds 



1 

(u{9 

10})(u 



(u{h2,3}] (u{4,5}] (u{9,10})(^ (u{6, 7}) 


Figure 5. The attack tree for Exam¬ 
ple 13.21 Leaves with the same label 
and their incoming arcs have been co¬ 
alesced. 


H{X) — H{X\Y) = log 10 — ^log3 — I ~ 2.45 bits. With [/(•) = E{-), the error entropy, 
we have Ia{S,p) = E{X) - E{X\Y) = 0.5. 


Example 3.2 (noisy version). We consider a version of the previous mechanism where the 
public source queried by the attacker is not entirely truthful. In particular, for security rea¬ 
sons, whenever queried about age of an individual, the source adds a uniformly distributed 
offset r € {—1,0,-1-1} to the real answer. The only difference from the previous example is 
that the conditional probability matrix pAge('I') is not deterministic anymore. For example, 
for X = 1, we have 


PAge(2/|l) 


g if y S {64, 65, 66} 
0 otherwise 


(also note that we have to insert 29, 32, 64 and 69 as possible mechanism’s observations into 
TAge-) Figure [5] shows the attack tree induced by the strategy cr of Figure [3] and the uniform 
prior in this case. If U{-) = H{-) we obtain Ia{S,p) = log 10 — ^ log3 — ^ 2.31 bits; if 

U{-) = E{-), instead, Ia{S,p) = ^ ~ 0.43. 


Example 3.3 (cryptographic devices). We can abstractly model a cryptographic device as 
a function / taking pairs of a key and a message into observations, thus, / ; /C x A4 —>■ T- 
Assume the attacker can choose the message m G A4 fed to the device, while the private 
key k is hxed and unknown to him. This clearly yields an action-based mechanism S where 
A = /C, Act = Ai and y are the observations. If we assume the observations noiseless, then 
the conditional probability matrices are defined by 


Pm{y\k) = 1 iff f{k,m)=y. 













































10 


M. BOREALE AND F. PAMPALONI 


We obtain therefore a deterministic mechanism. This is the way, for example, modular 
exponentiation is modeled in [2U]. More realistically, the observations will be noisy, due e.g. 
to the presence of “algorithmic noise”. For example, assume T C N is the set of possible 
Hamming weights of the ciphertexts (this is related to power analysis attacks, see e.g. [25]. 1 
Then we may set 

Pm{y\k) = Vi{f{k,m)+N = y) 

where is a random variable modelling noise. For example, in the model of DES S-Boxes 
considered in m, /C = Wf = {0,1}®, while T = {0,1, 2,...} is the set of observations: the 
(noisy) Hamming weight of the outputs of the target S-Box. In this case, N is taken to 
be the cumulative weight of the seven S-Boxes other than the target one. It is sensible to 
assume this noise to be binomially distributed: N ~ B{m,p), with m = 28 and p = ^- See 
[7] for details. 


4. Comparing Adaptive and Non-adaptive Strategies 

Conceptually, we can broadly classify mechanisms into two categories, depending on the 
size of the set Act. The first category consists of systems with a huge - exponential, in 
the size of any reasonable syntactic description - number of actions. The second category 
consists of systems with an “affordable” number of actions. In the first category, we find, 
for instance, complex cryptographic hardware, possibly described via boolean circuits or 
other “succinct” notations (cf. the public key exponentiation algorithms considered in |26j.l 
In the second category, we find systems explicitly described by tables, such as databases 
fExample l3.1l and l3.2l) and S-Boxes (Example 13.31 1 It makes sense to assess the difficulty of 
analysing the security of mechanisms separately for these two categories. 

4.1. Systems in Succinct Form. We argue that the analysis of such systems is in general 
an intractable problem, even if restricted to simple special instances of the non-adaptive 
case. We consider the problem of deciding if there is a finite strategy over a given time 
horizon yielding an information flow exceeding a given threshold. This decision problem is 
of course simpler than the problem of finding an optimal strategy over a finite time horizon: 
indeed, any algorithm for finding the optimal strategy can also be used to answer the first 
problem. We give some definitions. 

Definition 4.1 (Systems in boolean forms). Let t,u,v be nonnegative integers. We say 
a mechanism S = {A,y,Act,{Ma : a E Act}) is in {t,u,v)-boolean form if A = {0,1}*, 
Act = {0,1}“, T = {0,1}'^ and there is a boolean function / : {0,1}*+“ —>■ {0,1}’^ such that 
for each x E T, y E T and a E Act, Pa{y\x) = 1 iff f{x, a) = y. The size of S is dehned as 
the syntactic size of the smallest boolean formula for /. 

It is not difficult to see that the class of boolean forms coincides, up to suitable encodings, 
with that of deterministic systems. 

Definition 4.2 (Adaptive Bounding Problem in succinct form, abps). Given a mechanism 
5 in a (t, m, i;)-boolean form, represented by a boolean expression, a prior distribution p(-), 
I > 1 and T > 0, decide if there is a strategy a of length < I such that Ia{S;p) > T. 
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In the following theorem, we shall assume, for simplicity, the following reasonable pro¬ 
perties of U{-): if p{-) concentrates all the probability mass on a single element, and q{-) is 
the uniform distribution, then 0 = U{p) < U{q). A slight modification of the argument also 
works without this assumption. The theorem says that even length 1 (hence non-adaptive) 
strategies are difficult to assess. 

Theorem 4.3. Assume [/(•) satisfies the above stated property. Then the abps is NP-/iard, 
even if fixing t = v = I = 1, and T = 0. 

Proof. We reduce from the satisfiability problem for boolean formulae. Let ..., Zu) = 
4>{z) be an arbitrary boolean formula with u free boolean variables zi,..., We show how 
to build in polynomial time out of (f{z) a mechanism S in (1, u, l)-boolean form, and a prior 
p{-), with the following property; there is a length 1 strategy a s.t. Ia{S,p) > 0 iff 4>{z) is 
satisfiable. Take X = y = {0,1} and Act = {0,1}“. Let the mechanism <S be defined by the 
boolean function /(x, zi,..., Zu) = x A fi(z). Let p(-) be the uniform prior on X = {0,1}. 
Now, if there is an action b = (6i,..., bu) G Act such that (j){b) = 1 {4>{z) is satisfiable) then 
clearly we will have that Y = X A 4>{b) is logically equivalent to A, hence U{X\Y) = 0. 
Consequently, setting ct = [e i-)- 6], we will have that 1^(3,p) = U{X) — U{X\Y) > 0. On 
the other hand, if (f{z) is not satisfiable, then for any b € Act we will have that Y = X A (j){b) 
is logically equivalent to 0, hence U(X\Y) = U(X). Consequently, for any cr = [e i-)- 61, we 
will have l4S,p) = U{X)-U{X\Y) = 0. □ 

We should stress again that the above result concerns the difficulty of analyzing succinct 
mechanisms under the simplest possible form of attacker; by no means it entails that the 
adaptive and non adaptive attackers are equally effective. The following example should 
clarify that between the two forms of attacker, there can be a huge difference in terms of 
effectiveness. 

Example 4.4 (envelopes). A secret bit s G {bo, 6i} and a numbered envelope e G {1,..., X} 
are drawn according to some distribution p. A piece of paper with the value of s written on 
it is put into e. All other envelopes are filled with a piece of paper revealing the envelope 
containing the secret, that is e. The adversary can choose and open any of the envelopes 
and examine its content; that is, actions of this mechanism are envelope numbers. 

Assume the envelope is chosen uniformly at random. Clearly, the obvious adaptive 
strategy leads the adversary to discover the secret after at most two actions. On the other 
hand, a non-adaptive, brute-force strategy will lead him to examine one envelope after the 
other, ignoring the suggestion given in the envelopes opened so far, leading to a strategy of 
length N — 1. 

4.2. General Systems. The following results apply in general, but they are particularly 
significative for systems with a moderate number of actions. The next theorem essentially 
says that, up to an expansion factor bounded by \ Act\, non-adaptive strategies are as efficient 
as adaptive ones. In fact, given any strategy a, one can construct a non-adaptive a' that 
is only moderately larger than a and achieves at least the same leakage, as follows. In any 
history induced by a, each action can occur at most I times, where I is the length of a, and 
the order in which different actions appear in the history is not relevant as to the final belief 
that is obtained. For any history of a to be simulated by an history of a', it is therefore 
enough that the latter offers all the distinct actions offered by a, each repeated I times. 
Note that, for a strategy a, the number of distinct actions that appear in a is |range(iT)|. 
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Theorem 4.5. For each finite strategy a of length I it is possible to build a non-adaptive 
finite strategy a' of length |range(cr)| x I, such that Ia-'{S,p) > Ia{S,p). 


Proof. Let range(cr) = {oi,..., ah} and let a' be any non-adaptive strategy that plays each 
of oi,..., a/i for Z times, for example, a' = [ai,..., ah, ■ ■ ■, ai,..., ah] {I times); note that 
the length of a' is h x I, as required. For any y^ (j < 1), we shall denote by a' — y^ the 
non-adaptive strategy of length h x I — j obtained by removing from cr', seen as a list, j 
actions bi,..., bj, where bi = cr{s),... ,bj = a{y^~^). 

Denote by 1)^ and Y„i the r.v. on y* corresponding to a and a', respectively. We will 
show that U{X\Yfj) > U{X\Y„i). Take any x € T s.t. p{x) > 0 and y^ € dom(po-)- We note 
that, for any sequence and for an appropriate interleaving of the two sequences 

and y^ that here we denote by just we have that 

Pa'-yj{y’^^~^\x)Pa{y^\x) = |x) . (4.1) 

From (BH), it follows that 


Paiy^x) = 

yhl—j yhl—j 

Now, for any x and y^ such that p{x) > 0 and Pa{y^) > 0, we have the following. 

Pa{yfix)p{x) 


Pa{xW) = 


Pu{y^) 

E Pa'{y^'-\y^\x)^^, 




= E 


yhl-j 


Pa'ix\y^^ ^,y^)Pa'{y^^ ^y^) p{x) 

p{x) Pa{y^) 

Pa'{y’^^~^,y^) 


= E P<^'ix\y'"^ ^,y^) 


nhl-j 


Pu{y^) 


(4.2) 


(4.3) 


(4.4) 


where in the second equality of (|4.3D we have applied (14.21) . It is an easy matter to show that 
^ = 1 (this is basically a consequence of (14.1|) : we leave the details to the 

interested reader.) Thus (|4.4D shows that Pa{-\y^) can be expressed as a convex combination 
of the distributions Pa'{'\y^’'~^,yP), for Using this fact, the concavity of U{-) 

and Jensen’s inequality, we arrive at the following. 

uipA-W)) > u mpAAa''‘-= 

yhl-j Pf^yy J 


(4.5) 
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We finally can compute the following lower-bound for U{X\Ycr 

u{x\Y^) = 


where the inequality 








Pa{y^) 


yj y^^~3 

yj yhl—j 

= EP-'(/')t^(y.'(-|/')) = U{X\Y^,) 

y }~tL 

follows from 


(4.6) 


□ 


In deterministic systems, repetitions of the same action are not relevant: this leads to 
the following improved upper bound on the length of the non-adaptive a' that simulates a. 


Proposition 4.6. If the mechanism S is deterministic, then the upper-bound in the previous 
theorem can he simplified to |range(cj)|. 

Proof. Let a be any finite non-adaptive strategy for S. Suppose there is an action a that 
occurs at least twice in a, seen as a tuple of actions, and let a- be the non-adaptive strategy 
obtained by removing the first occurrence of a from a, seen as a list. Assume the two o’s 
occur at position i and j, i < j, of cr. Since <S is deterministic, it is easily seen that, for 
each = (yi,... ,yn), if Pi Pj then for each x Pa{y^\x) = 0 (as submitting twice the 
same action a cannot give rise to two different answers yi and yfi, and as a consequence 
Pa{y"') = 0. On the other hand, if yi = yj, then, denoting by y"'~^ the sequence obtained by 
removing y^ from y”, for each x we have: ^^(y^lx) = P(T-{y^~^\x) (as p{yi\x) = p{yj\x) is 
either 0 or 1), and as a consequence Po-(y"') = Pa'{y'^~^) and Pa{x\y^) = Pa_{x\y"'~^). This 
implies that U{X\Yfj) = U{X\Yfj_). Repeating this elimination step, we can eventually get 
rid of all the duplicates in a, while preserving the value of Ia{S,p). Applying this fact to 
the strategy a' defined in the proof of Theorem 14.51 we can come up with a strategy a" of 
length |range((T)| such that Ia"{S,p) = Ia_{S,p). □ 

Example 4.7. We reconsider Example 13.11 For the adaptive strategy a defined in Figure 
[21 we have already shown that, for [/(•) = H{-), Ifj{S,p) Ri 2.45. Consider now the non- 
adaptive strategy a' = [ZIP,Date,Age], which is just one action longer than a. The 
corresponding attack tree is reported in Figure O the final partition obtained with a' is 
finer than the one obtained with a. In fact, Ifji{S,p) = log 10 — | « 2.92 > I„{S,p). 

The results discussed above are important from the point of view of the analysis of 
randomization mechanisms. They entail that, for systems with a moderate number of 
actions, analyzing adaptive strategies is essentially equivalent to analyzing non-adaptive 
ones. The latter task can be much easier to accomplish. For example, results on asymptotic 
rate of convergence of non-adaptive strategies are available (e.g. [71 Th. IV.3].) They 

permit to analytically assess the resistance of a mechanism as the length of the considered 
strategies grows. 
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5. Maximum Leakage 

In this section we show that the class of adaptive and non adaptive strategies induce the 
same maximum leakage, where the maximum is taken over all strategies. For truly proba¬ 
bilistic mechanisms, strategies achieving maximum leakage are in general infinite. A key 
notion is that of indistinguishability: an equivalence relation over A s.t. x and x' are 
indistinguishable if, no matter what strategy the adversary will play, he cannot tell them 
apart. 

Definition 5.1 (Indistinguishability). We define the following equivalence relation over X: 

X = x' iff for each finite a : Pai'lx) = Pa{-\x'). 

Despite being based on a universal quantification over all finite strategies, indistin¬ 
guishability is in fact quite easy to characterize, also computationally. For each a G Act, 
consider the equivalence relation defined by x =a x' iff pa{-\x) = pa{-\x'). We have the 
following result (see the Appendix for a proof.) 

Lemma 5 . 2 . x = x' iff for each a G Act, pa{-\x) = pa{-\x'). In other words, = is r\a£Act =a- 

Now, consider X/ =, the set of equivalence classes of =, and let c ranges over this set. 
Let [X] be the r.v. whose outcome is the equivalence class of X according to =. Note that 

p{c) = Pr([X] = c) = We consistently extend our /-notation by defining 

U{X\[X])=Y,p{c)U{p{-\[X]=c)) and I{X ■ [X]) = U{X)-U{X\[X]). 

C 

More explicitly, p(-|[X] = c) denotes the distribution over X that yields p{x)/p{c) for x G c 
and 0 elsewhere; we will often abbreviate p('|[X] = c) just as p{-\c). Note that I{X ; [X]) 
expresses the information gain about X when the attacker gets to know the indistinguisha¬ 
bility class of the secret. As expected, this is an upper-bound to the information that can 
be gained by playing any strategy. 

Theorem 5 . 3 . h{S,p) < /(X; [X]). 

Proof. Fix any finite strategy a and prior p{-). It is sufficient to prove that U{X\Y) > 
U{X I [X]). The proof exploits the concavity of U. First, we note that, for each x and of 
nonzero probability we have (c below ranges over Xf =): 

Paix\y^) = X) ■ (5.1) 

c p-iy^) c 

By (|5.1I) . concavity of U{-) and Jensen’s inequality 

U{p{-\y^)) > '^PaWWipaiWx)) ■ 


(5.2) 
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Now, we can compute as follows (as usual, below runs over sequences of nonzero proba¬ 
bility) : 

U{X\Y) = '^Pa{y^)U{pa{-W)) > ^Pa{y^)Pa{cW)U{pa{-W ,C)) (5.3) 

yi yi,c 

= ^Pa{y^)Pa{cW)U{p{-\c)) = XI ) ^(^’(■|c)) (5.4) 

yi,c ^ \y^ / 

= Xp(c)W|c)) = U{X\[X]) 

c 

where: (15.311 is justified by (15.211 : and the first equality in (15.4p follows from the fact that, 
for each x, Pa{x\y^,c) = p{x\c) (once the equivalence class of the secret is known, the 
observation y^ provides no further information about the secret.) D 

As to the maximal achievable information, we start our discussion from deterministic 
mechanism. 


Proposition 5.4. Let S be deterministic. Let a = [ai,... ,ak] he a non-adaptive strategy 
that plays all actions in Act once. Then Ii,{S,p) = Ia{S,p). 


Proof. Let {X,Y) ~ Pa{-)- We prove that U{X\Y) = U{X\ [A]). We first note that for 
each c € A/ = there is exactly one sequence y^ s.t. PaiPcl^) = 1- this follows from S 
being deterministic. Moreover, if c 7 ^ c' then y^ 7 ^ y^,: otherwise, it would follow that 
Pai{y\c) = Pai{y\c') for each a* G Act and y & y, contrary to Lemma ED (note that p{-\c) 
is the same as p{-\x), for any x G c.) These facts can be used to show, through easy 
manipulations, that p{x\y^) = p{x\c) for each x. As a consequence, one can compute as 
follows. 


U(X\Y) 


yk 

^p{c)^p„{y^\c)U(j)„{-\y'^)) 

C yk 

Xp(c)C/(Pa(-|?/c)) 

c 

Y,P{c)U{Pa{-\c)) 

U{X\[X]). 


□ 


Hence, in the deterministic case, the maximal gain in information is obtained by a trivial 
brute-force strategy where all actions are played in any fixed order. It is instructive to 
observe such a strategy at work, under the form of an attack tree. The supports of the 
distributions that are at the same level constitute a partition of X: more precisely, the 
partition at level i (1 < i < /c) is given by the equivalence classes of the relation =aj.. 
An example of this fact is illustrated by the attack tree in Figure [U relative to the non- 
adaptive strategy [ZIP, Date, Age] for the mechanism in Example 13.11 This fact had been 
already observed in [26] for the restricted model considered there. Indeed, one would obtain 
the model of [26| by stripping the probabilities off the tree in Figure E 

The general probabilistic case is more complicated. Essentially, any non-adaptive stra¬ 
tegy where each action is played infinitely often achieves the maximum information gain. 
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Figure 6. The attack tree corresponding to the the non-adaptive strategy 
[ZIP, Date, Age] for Example 13.II 


The next theorem considers one such strategy; the proof of this result is reported in Appen¬ 
dix E 

Theorem 5 . 5 . There is a total, non-adaptive strategy a s.t. Ia(S,p) = liX: [X]). Conse¬ 
quently, h{S,p) = I{X- [X]). 

Of course, as shown in the preceding section, finite adaptive strategies can be more 
efficient in terms of length by a factor of \Act\ when compared with non-adaptive ones. 
Concerning capacity, we do not have a general formula for the maximizing distribution. In 
what follows, we limit our discussion to two important cases for U{-), Shannon entropy and 
error entropy. In both cases, capacity only depends on the number K of indistinguishability 
classes. For guessing entropy, we conjecture that C{S) = but at the moment a proof 
of this fact escapes us. 

Theorem 5 . 6 . The following formulae holds, where K = \X/ = \. 

• For U = H (Shannon entropy), C{S) = logX. 

• For U = E (Error entropy), C{S) = 1 — 

Proof. Let Xi any representative of class Cj, for i = 1,..., X. 

• U = H. By the symmetry of mutual information in the case of Shannon entropy, we have 

/(X;[X]) = H{[X])-H{[X]\X) = H{\X]) 

=0 

= -^P{ci)'iogp{ci) < logX 

Ci 

where the last inequality follows from the property of Shannon entropy that H{q) < 
log |supp(g)|, for any distribution q. On the other hand, if we take the distribution p{-) 
defined as p{xi) = ■^, and p{x) = 0 elsewhere, we can easily compute that /(X; [X]) = 
logX. 
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• U = E. Let p(-) be any prior and assume without loss of generality that p{xi) = 
maXx:£ciP{x) for each i, and furthermore that p{xi) = maxxp{x). By easy manipula¬ 
tions, we have: 

IiX-,[X]) = EiX)-E{X\[X]) 

/ K \ K 

\i=l / i=2 

Now it is easily checked that the last term in this chain is < 1 — this is done by 
separately considering the two cases maxxp{x) = p{xi) < and maxa;p(x) = p{xi) > 

On the other hand, if we take, as above, the distribution p{-) defined as p{xi) = and 
p{x) = 0 elsewhere, we can easily compute that I{X; [X]) = 1 — □ 

Example 5.7. Consider the mechanism dehned in Example 13.11 One has the following 
capacities: for U{-) = H{-), C{S) = logS = 3, while for [/(•) = E(-), C{S) = | = 0.875. 

6. Computing Optimal Finite Strategies 

We show that, for finite strategies, Ia{S,p) can be expressed recursively as a Bellman equa¬ 
tion. This allows for calculation of optimal finite strategies based on standard algorithms, 
such as backward induction. 

6.1. A Bellman Equation. Let us introduce some terminology. For each y, the y-derivative 

of a, denoted ay, is the function defined thus, for each y^ G y*: ay{y^) = a{yy^). Note that 
if a has length I > 1, then ay is a strategy of height <1 — 1. For / = 1, is the empty 
function. Recall that according to (12.3p . for h = ay, we hav^ 

p^y{x) = Pa{x\y ). 

By convention, we let !„{■ ■ ■) denote 0 when a is empty. Moreover, we write • •) as 

Lemma 6.1. Let p{-) be any prior on X. Let a be a strategy with a{£) = a. Then 

Ia{S-,p) = Ia{S-,p) + iZyPa{y)Iay{S-,p'"y). 

We introduce some additional notation to be used in the proof of this lemma. Let I 
denote the length of a strategy a, and let {X,Y) be distributed according to Po-(-). We 
can decompose Y as the concatenation of the 1st observation and whatever sequence of 
observations is left, thus: Y = Yi ■ Yg. Here, Yi takes values on y, while Yg takes values on 
a subset of Ui<j<;W - in particular, if / = 1, Yg takes on the value £ with probability 1. In 
what follows, we denote the marginal distribution of Yi under a just as Pa{y), and that of 
Yg as P(j{y^), for generic y and yE 

®In terms of a given prior p(-) and of the matrices of S, this can be also expressed as; p°‘^{x) = 

Pa,{v\<v{< 

y L/ Po.iyW)pi^') 
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Proof, (of Lemma l6.ID It is an easy matter to prove the following equations, for each prior 
p(-), finite strategy a with a{s) = a, sequence , observation y, one has (below, y and 
y^ run over elements of nonzero probability; moreover, for any prior p(-), history h and 
strategy u, the term is to be parsed as {p^)cr)- 


Pa{y) 

= Pa{y) 

(6.1) 

pAx\y) 

= Pa{x\y) = 

(6.2) 

Pa{x\yy^) 

= P7y{xW) 

(6.3) 

Pa{y^\y) 

II 

(6.4) 


By applying equalities (j6.ll) . (16.2|) . (|6.3I) and (16.4|) above as appropriate, we have: 
US,p) = I{X-Y) 

= [U{X) - U{X\Yi)] + [U{X\Yi) - U{X\Y)] 


U{p) -'^Pa{y)U{pa{-\y)) 
y 

U{p)-'^Pa{y)U{p„{-\y)) 
y 


+ 

+ 


'^Pu{y)U{p„{-\y)) - Pa{y,y^)U{pa{-\yy^)) 

- y 

'^Pa{y)U{pa{-\y)) -Pa{y)Pa{y^y)U{pa{-\yy^)) 

- y 




U{p) - '^Pa{y)U{pa{-\y)) 


+ '^Pcriy) 
y 

+ ^Pa{y) 


U{Pa{-\y)) - ^Pa{y^\y)U{pa{-\yy^)) 


u{p^y)-Y.p^y{y^)u{p^y{-y)) 


y^ 


= Ia{S-,p) + ^Pa{y)Iay{S] 


p 


ay 


□ 


Let US say that a strategy a of length I is optimal for 5, p{-) and I if it maximizes Ia{S,p) 
among all strategies of length 1. 


Corollary 6.2 (Bellman-type equation for optimal strategies). There is an optimal strategy 
a* of length I for S and p{-) that satisfies the following equation 

I^*(5;p) = max \la{S-,p) + ^ pa{y)Iaiy{S-,p‘"y) \ (6.5) 

I y-Pa{y)>o J 

where a* y is an optimal strategy of length I — 1 for S and p“^(-). 

Corollary 16.21 allows us to employ dynamic programming or backward induction to 
compute optimal finite strategies. We discuss this briefly in the next subsection. 


6.2. Markov Decision Processes and Backward Induction. A mechanism S and a 
prior p(-) induce a Markov Deeision Process (mdp), where all possible attack trees are re¬ 
presented at once. Backward induction amounts to recursively computing the most efficient 
attack tree out of this mdp, limited to a given length. More precisely, the mdp A4 induced by 
S and a prior p{-) is an in general infinite tree consisting of decision nodes and probabilistic 
nodes. Levels of decision nodes alternate with levels of probabilistic nodes, starting from 
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{y ,p“ 

p°-y. 


Figure 7. The first few levels of a mdp induced by a prior p{-) and a mechanism 
with Act = {a, 6} and y = {y,y'}- Round nodes are decision nodes and squares 
nodes are probabilistic nodes. For the sake of space, labels of the last level of arcs 
and nodes are only partially shown. 


the root which is a decision node. Decision nodes are labelled with probability distributions 
over X, edges outgoing decision nodes with actions, and edges outgoing probabilistic nodes 
with pairs (y. A) of an observation and a real, in such a way that (again, we identify nodes 
with the corresponding history): 

• a decision node corresponding to history h is labelled with if this is defined, other¬ 

wise the node and its descendants are removed, as well as the incoming edge; 

• for any pair of consecutive edges leading from a decision node h to another decision node 
hay, for any a G Act and y € y, the edge outgoing the probabilistic node is labelled with 
{y,Pa{y))- 

Figure [7] shows the first few levels of such a mdp. 

In order to compute an optimal strategy of length Z > 1 by backward induction, one initially 
prunes the tree at Z-th decision level (the root is at level 0) and then assigns rewards to 
all leaves of the resulting tree. Moreover, each probabilistic node is assigned an immediate 
gain. Rewards are then gradually propagated from the leaves up to the root, as follows: 

• each probabilistic node is assigned as a reward the sum of its immediate gain and the 
average reward of its children, average computed using the probabilities on the outgoing 
arcs; 

• each decision node is assigned the maximal reward of its children; the arc leading to the 
maximizing child is marked or otherwise recorded. 

Eventually, the root will be assigned the maximal achievable reward. Moreover, the paths 
of marked arcs starting from the root will define an optimal strategy of length Z. We can 
apply this strategy to our problem, starting with assigning rewards 0 to each leaf node h, 
and immediate gain Ia{S,p^) to each a-child of any decision node h. The correctness of the 
resulting procedure is obvious in the light of Corollary 16.21 



Figure 8 . A Shannon entropy optimal strategy for Example 13.21 Leaves with the 
same label and their incoming arcs have been coalesced. 

In a crude implementation of the above outlined procedure, the number of decision 
nodes in the MDP will be bounded by (|T| x \Act\f~^^ — l (probabilistic nodes can be dispensed 
with, at the cost of moving incoming action labels to outgoing arcs.) Assuming that each 
distribution is stored in space OdTl), the MDP can be built and stored in time and space 
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0(|Af| X (|3^| X This is also the running time of the backward induction outlined 

above, assuming U{-) can be computed in time OdT”!) (some straightforward optimizations 
are possible here, but we will not dwell on this.) By comparison, the running time of the 
exhaustive procedure for deterministic systems outlined in [261 Th.l] is 0{l x x \X\ x 

log IT’D, where r is the maximal number of classes in any relation =a; since r can be as 
large as |T|, this gives a worst-case running time of 0{l x x \X I X log IT’D- 

Example 6.3. Applying backward induction to the mechanism of Example [32] with U{■) = 
H{-) and I = 2, one gets the optimal strategy a shown in Figure jS] with Io-{S,p) 2.4 bits. 
Some details of the derivation of this strategy are reported in Appendix [Cj 


7. The role of concavity 

We elucidate a connection between our definition of uncertainty (convexity -|- continuity) 
and a concept of scoring rule in Bayesian decision theory. A scoring rule encodes a system 
of (dis)incentives: a wrong forecast about an event causes the forecaster a loss, whose 
magnitude depends on both the forecast that has been put forward, and on the event 
that has actually occurred. The average loss under the best forecast is named entropy 
in this context. In essence, we will show that: (a) every proper scoring rule induces an 
entropy function that is concave, hence necessarily continuous at least in the interior of the 
probability simplex; (b) every concave function arises, under an additional mild assumption, 
as the entropy induced by a certain scoring rule. This almost complete correspondence, and 
its simple definition, give a strong support to our choice of the class of uncertainty functions. 

The connection between concavity and uncertainty has been explored in Statistics at 
least starting from the 1950’s, and it comes into many different flavours. The following 
discussion is our personal take of this issue, for which we claim no technical novelty. Our 
presentation is partly inspired by m Sections 9-10]. For notational simplicity, in what 
follows we fix any ordering of the elements of A, say xi, ■.■,Xn, so that we can identify any 
distribution p with a vector {pi, ...,pn) E M". 

Formally, in the context of Bayesian decision theory, a scoring rule is a function 

S : X xV^R 

O 

where V denotes here the interior part of V, that is, the set of those distributions q s.t. 
q{x) > 0 for each x E A. This function is given the following interpretation. A forecaster 
(in our case, the adversary) is asked to put forward a forecast about the outcome of an event 
(in our case, the secret.) A forecast takes the form of a probability distribution q, which 
represents the forecaster’s estimation of the probability of each possible outcome x E A. 
Then S{x,q) represents the loss incurred by the forecaster when the outcome is actually x 
and he has put forward q. If the outcome is distributed according to p, the average loss 

O 

incurred if putting forward q (zV is given by 

S{p,q) = ^p(x)5(x,g). 

X 

o 

This definition is extended to each q in the frontier, q G V\V, hy liminfq/_j.g S{p, q'), be this 
finite or infinite. The scoring rule S is called proper if the choice q = p always minimizes 
S{p,q). In other words, a proper scoring rule (PSR) encodes a penalty system that forces 
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the forecaster to be honest and propose the distribution he really thinks is the true one. 
For a PSR, the minimal loss corresponding to p is - perhaps not surprisingly - also called 
the induced entropy 

H{p) = S{p,p). 

Seen as a function of p, H(p) gives a measure of the intrinsic risk associated with each p, 
under the loss model encoded by the PSR S. 


Proposition 7.1. Any entropy function induced hy a PSR is a concave function over V. 

O 

As a consequence, it is continuous over V. 


Proof. Let P[{p) = S{p,p), with S a PSR. Consider any A G [0,1] and p,q € P, and let 
r = Ap + (1 — X)q. Assume first that r is in the interior of V. Then 

H{r) = S{r,r) 

= S{Xp+ (1 - X)q,r) 

= XS{p,r) + {l-X)Siq,r) 

> XS{p,p) + {l-X)S{q,q) 

= XH{p) + il-X)Hiq) 


where the second equality follows from the linearity of S w.r.t. the first argument and the 
inequality follows from the definition of PSR. If r is in the frontier of V, by the properties 
of lim inf, one has 


H{r) = 


> 


lim inf S{r, r') 

r'^r 

lim inf S{Xp + (1 — A), r') 

r'^r 

lim inf XS{p, r') + (1 — X)S{q, r') 

A lim inf S{p, r') + (1 — A) lim inf S{q, r') 

r'^r r'^r 

XS{p,r) + (1 - X)S{q,r) 


and then the reasoning proceeds as above. This concludes the proof that H is concave. 
Finally, it is a standard result that concavity over V implies continuity - in fact, local 

O _ 

lipschitzianity - over V (see e.g. HU].) □ 


On the other hand, under a mild additional assumption, any concave function is induced 
by a PSR, as we will check shortly. Let H be concave on V. It is well known that concave 

_ O 

functions enjoy the following supporting hyperplane property (see e.g. [IDj.) For each q ^V, 
there exists a vector Cq = (ci, ...,Cn) G M” such that, for each p £ V (here (•, •) denotes the 
usual scalar product between two vectors) 

n 

H{p) < H{q) + {cq,p-q) = H{q)+^Ci{pi-qf). (7.1) 

i=l 

The above relation merely means that the graph of H is all below an hyperplane that is 
tangent to the point {q,H{q)). A vector Cq satisfying (I7.ip is called a subgradient of H at 
q. In particular, if H is differentiable at q, there is exactly one choice for Cq, namely the 
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gradient oi H aX q 

, /dH dH \ 

O 

Now, for q gV, we set 

S{xi,q) = H{q) + a - {cq,q). (7.2) 

Clearly, for each p, we have S{p,q) = Ep[S{X,q)] = H{q) + {cq,p — q). We extend this to 

q in the frontier by S{p,q) = liminfg/^g S(p, g'). Assume now that the vectors Cg can be 
chosen in such a way that the following property holds true for any p in the frontier 

lim inf (cp/, p — p') = 0. (7.3) 

p'— 

This condition means that, as p' approaches a point p in the frontier, the tangent hyperplane 
at p' does not approach a vertical hyperplane "too fast”. We now check that S{x,q) is a 
PSR. First, note that for any p a V, we have S{p,p) = H{p). For p in the interior, this 
follows by definition; while for p in the frontier, we note that S{p,p) = liminfp/ S{p,p') = 
H(p) + lim iniq) {cp/ ,p—p') and then exploit (17.3p . Now, applying the supporting hyperplane 
property (j7.1|) for q in the interior we have 

S{p,q) = Ep[S{X,q)] 

= H{q) + {cq,p - q) 

> H{p) 

= S{p,p). 

For q in the frontier, the same property follows from 

S{p, q) = lim inf S{p, q') > lim inf S{p,p) = S{p,p ), 

<?' g' 

where the inequality follows from above. This shows that S{x,q) is a PSR and that the 
induced entropy is precisely the given concave function H. In other words, we have just 
shown the following proposition. 

Proposition 7.2. Every concave function over V that respects (17.3p is the entropy induced 
by some PSR. 

Example 7.3. Let us consider the Shannon entropy function H{p) = —'^iPilogpi. By 
direct calculation, it is immediate to check that S{xi,q) = —logqi is a PSR for Shannon 
entropy. However, it is instructive to apply the recipe in the proof of Proposition 17.21 to 
reconstruct S{x,q). 

H{p) is differentiable in the interior of the probability simplex. The gradient of 77 at q 
is 

V77(g) = (-(1 + loggi),..., -(1 +loggn)) = Cq. 

Now easy calculations show that {cp/,p — p') = H{p) — H{p') + D{p\\p') 0 as p' ^ p, thus 

making the condition (I7.3P tru^. Noting that {cq, q) = H{q) — 1, we apply (17.2p and define 

S{xi,q) = H{q) - (1 Tlogg*) - {H{q) - 1) = -logg*. 

'^Here D(p\\p') = 'f2,^pPog(pi/qi) is the familiar Kullback-Leibler divergence. Note in particular that 
D(p\\p') —>■ 0 as p' —>■ p. In passing, it is not true in general that D(p\\p') —>■ 0 as p —^ p'. 
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A similar calculation for H{p) = var(p) yields 

S{xi,q) = Yav{q) + x? - 2xi - Eg[X^] + 2Eg[X] = {xi - l)^ - {Eg[X] - l)^ 

which leads to S{p, q) = —EglX]"^ + 2Eg[X] + EplX"^] — 2Ep[X] and, as expected, S{p,p) = 
vai{p). 

We hnally consider the case of error entropy, H{p) = 1 — maxjpj. For each q let us 
denote by jg an index in {1, ...,n} such that qj^ = maxiqi (if there is more than one such 
index, choose one arbitrarily.) At each q with a single maximal element, H is differentiable 
and we can set (here 6xy is the Kronecker’s delta symbol) 

Cg = VH{q) = {-6^g^,...,-5nj,). 

Hence, applying (17.211 and noting that (cg, q) = —qj^, for each such q we let 

S{xi,q) = (1 - qj^) - 5ij^ + qj^ = 1 - 6ij^ . 

At points q with more than one maximal element, H is not differentiable, and the choice 
of Cg is not unique. However, Cg = (—•••, —<5njq) is still a subgradient, so the same 
definition of S{xi,q) as above applies. Note that S{p,q) = 1 — Pjq and, in particular, 
S{p,p) = 1 — pjp = 1 — maxiPi as expected. 


8. Conclusion, Related and Further Work 

We have proposed a general information-theoretic model for the analysis of confidentiality 
under adaptive attackers. Within this model, we have proven several results on the limits 
of such attackers, on the relations between adaptive and non-adaptive strategies, and on 
the problem of searching for optimal finite strategies. We have also elucidated a connection 
between our notion of uncertainty function and Bayesian decision theory. 

8.1. Related Work. In |26j . Kopf and Basin introduced an information-theoretic model of 
adaptive attackers for deterministic mechanisms. Their analysis is conducted essentially on 
the case of uniform prior distributions. Our model generalizes [26] in several respects: we 
consider probabilistic mechanisms, generic priors and generic uncertainty functions. More 
important than that, we contrast quantitatively the efficiency of adaptive and non-adaptive 
strategies, we characterize maximum leakage of infinite strategies, and we show how to 
express information leakage as a Bellman equation. The latter leads to search algorithms 
for optimal strategies that, when specialized to the deterministic case, are more time-efficient 
than the exhaustive search outlined in [20] (see Section O) 

Our previous paper [7] tackles multirun, non-adaptive adversaries, in the case of min- 
entropy leakage. In this simpler setting, a special case of the present framework, one 
manages to obtain simple analytical results, such as the exact convergence rate of the 
adversary’s success probability as the number of observations goes to infinity. 

Alvim et al. [T] study information ffow in interactive mechanisms, described as proba¬ 
bilistic automata where secrets and observables are seen as actions that alternate during 
execution. Information-theoretically, they characterize these mechanisms as channels with 
feedback, giving a Shannon-entropy based definition of leakage. Secret actions at each step 
depend on previous history, but it is not clear that this gives the adversary any ability to 
adaptively influence the next observation, in our sense. 


24 


M. BOREALE AND F. PAMPALONI 


In [2], Alvim et al. study g-leakage, a generalization of min-entropy leakage, where the 
adversary’s benefit deriving from a guess about a secret is specified using a gain function g: 
intuitively, the closer the guess to the secret, the higher the gain. Alvim et al. derive general 
results about gf-leakage, including bounds between min-capacity, (^-capacity and Shannon 
capacity. Gain functions are conceptually very close to the proper scoring rules (PSRs) we 
considered in Section [71 Abstracting from the unimportant difference of encoding gains 
rather then losses, a gain function can be seen in fact as a special case of a PSR where the 
forecast put forward by the forecaster is always a Dirac’s delta. One important technical 
difference between [2] on one side and the framework of PSRs and our paper on the other side, 
is that entropy functions, as defined in [ 2 ], are all generalizations of the familiar min-entropy. 
As such, they are in general neither concave nor convex. A thorough investigation of the 
connections between (^-leakage and PRSs is left for future work. More or less contemporary 
to the short version of the present paper is Mardziel et al.’s [29], which extends the analysis 
via 5 -leakage functions to systems with memory. This work is similar in spirit to ours, 
but now successive responses to queries may not be independent, as the secret evolves over 
time. They too utilize backward induction to calculate leakage. Like in the static case, this 
dynamic g-leakage does not lend itself to be recast in the present framework. A dynamic 
approach is also at the core of a model in [5| based on Hidden Markov Models, where the 
observed system evolves over time, although the secret is fixed. 

After the publication of the short version of the present paper | 8 |, we learned from a 
statistician colleague about the existence of a large body of work in Bayesian forecasting 
and PSRs. A good synthesis of this research can be found in the works of DeGroot, see 
e.g. [181 [19] and references therein, although the terminology used there is slightly different 
(utility functions are considered rather than scoring rules.) Remarkably, [l 8 | contains con¬ 
siderations on sequential observations and decisions which, despite the different terminology 
and emphases, come very close to our adaptive model of QIF. Dawid m and Gneiting and 
Raftery [21| give modern accounts of these themes. The role of concavity in QIF is also cen¬ 
tral to some recent works by Mclver et al. [SolEI]. An important result is the presentation 
of a prior vulnerability as a “disorder test” that is, interestingly, defined in terms of con¬ 
tinuous and concave functions. It would be interesting to see how much these approaches 
share with the one based on PSRs. 

The use of Bellman equation and backward induction, applied to multi-threaded pro¬ 
grams, in a context where strategies are schedulers, is also found in [TH]. [251123] propose 
models to assess system security against classes of adversaries characterized by user-specified 
‘profiles’. While these models share some similarities with ours - in particular, they too em¬ 
ploy MDP’s to keep track of possible adversary strategies - their intent is quite different from 
ours: they are used to build and assess analysis tools, rather than to obtain analytical re¬ 
sults. Also, the strategies they consider are tailored to worst-case adversary’s utility, which, 
differently from our average-case measures, is not apt to express information leakage. 

8.2. Further work. There are several directions worth being pursued, starting from the 
present work. First, one would like to implement and experiment with the search algorithm 
described in Section [H Adaptive querying of datasets might represent an ideal ground for 
evaluation of such algorithms. Second, one would like to investigate worst-case variations 
of the present framework: an interesting possibility is to devise an adaptive version of 
Differential Privacy |211l22j or one of its variants [9]. Finally, connections between (adaptive) 
QIF and PSRs in Statistics deserve to be further explored. 
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Appendix A. Proofs of Lemma EH AND OF Lemma 15.21 

Lemma A.l (Lemma 12.7p . Irj{X]Y) > 0. Moreover la-{X-,Y) = 0 if X and Y are inde¬ 
pendent. 

Proof. First note that, by a simple manipulation relying on Bayes theorem, we can express 
the prior distribution p{-) as follows, for each x C X (summation ranges over y of positive 
probability); 

pi^) = '^p{y)p{x\y) ■ 

y 

Hence by concavity of U and Jensens’s inequality: 

U{X) = U{p) 

= U \^p{y)p{-\y) 

> ^p{y)U {p{-\y)) 
y 

= U{X\Y). 

Moreover, if X and Y are independent, then for each y of positive probability, p{x) = p{x\y), 
hence J2yPiy)pi-\y) =Pi-), so that U{X) = U{p) = U{X\Y). □ 

Lemma A.2 (Lemma 15.21) . x = x' iff for eaeh a € Act, Pa{-\x) = pa{-\x'). In other words, 

= is I^aGAct -a- 

Proof. Let x, x' G X. Assume first x =„ x' for each action a. Let a be any finite strategy 
and a sequence such that y” G dom((T) and y^yn-\-i ^ dom(iT). From (12.11) . we know 

that 

Paiy'^yn+llx) = Pai{yi\x) ■ ■ ■ Par,{yn\x)par,+i{yn+l\x) 

Pa{y"'yn+l\x') = Pai{yi\x') ■ ■ ■ PaAyn\x')Pa„+i{yn+l\x') 
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where ai = (T(e), 02 = a{yi),..., an = a{yi ■ ■ ■ yn-i) and an+i = cr(y^). Prom the above 
equations, it is immediate to conclude that, as pa{-\x) = pa{-\x') for each action a, then 
Pa{y^yn+i\x) = Pa{y^yn+i\x'). Since this holds for any finite a, we conclude x = x'. The 
other direction is obvious, as pa{-\x) = P[a]{-\x) (where [a] is a length 1 strategy) for any o 
and X. □ 


Appendix B. Proof of Theorem 15.51 

In order to prove Theorem 15.51 we introduce some terminology and concepts from the 
information-theoretic method of types US]- For distributions p{-) and q{-) on y, we let their 

Kullbaek-Leibler (KL) divergence be defined as: D{p\\q) = J2yP{y) log with the proviso 

that 0 • log = 0 and p{y) ■ log = -|-cx) for p{y) > 0. Given n > 1, and a sequence y” G 
y^, the type (or empirical distribution) of y”, denoted tyn, is the probability distribution 

over y dehned thus: tyn[y) = where n(y|y"’) denotes the number of occurrences of y 

in y”. In this section, H will always stand for Shannon entropy, H{p) = — J2xPi^) logp(x). 
We will often abbreviate H{tyn) as H{y^), and D{ty7i\\q) as D{y^\\q), thus denoting the 
type by a corresponding sequence, when no confusion arises. Given e > 0 and a probability 
distribution q{-) on y, the “ball” of n-sequences whose type is within distance e of q{-) is 
dehned thus: 

B^^\q,e) = {y^ ■. D{y^\\q)<e}. 

We shall also make use of the following new terminology about sequences. Assume \Act\ = 
k. Given a sequence y'^ = {yi,y 2 , ■ ■ ■ ,yn) and an integer j = l,...,k, we shall denote 
by ?/”'(j) the subsequence {yj,yk+j,y 2 k+j, ■ ■ ■), obtained by taking the symbols of y"' at 
position j, k + j,2k+j,.... In the rest of the section, unless otherwise stated, we let a 
be the inhnite non-adaptive strategy that plays actions oi,..., a^,, ai, 02 ,..., in a lock-step 

fashion: a{y^) = a(j modA:)+i- For any n > 1, we let On be the truncation at level n of a: 

<Tn = cr\n. For a prior p(-), let p^^ be the resulting joint probability distribution on A x y^: 
note that, for each x, the support of Po-„(-|a:) is included in y^. Let {X,Y^) be jointly 
distributed according to p^^: here we have introduced the superscript ” to record explicitly 
the dependence of Y from n. Let us dehne the set of sequences y'^ where the type of each 
sub-sequence y"'(f) is within e distance of the distribution pa^{^\x), thus: 

B^'^\x,£) = {y^^ : D {y^{i)\\pa,i-\x)) < £ for i = 1,.. . ,k} . (B.l) 

Furthermore, we define the following quantities depending on a given sequence y” and 
X e Y: 


«(!/”) = Z »(!/"(’» 

i=l 

(B.2) 

dW'WpoMO) = 

i=l 

(B.3) 
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Finally, for each sequence y™' G and for each action a G Act, we let 

m 

Pa{y'^\x) = X{Pa{y^\x) (B.4) 

i=l 

(this is the probability of generating y™ with i.i.d. extractions obeying distribution ya('|3^)-) 


Lemma B.l. Let n be a multiple of k and x G A. Then 

Pa (y”|x) = . 


Proof. 


n 

= Y[pajiy''{j)\x) 
i=i 

i=i 

^ 2-f (A(y")+D(y"||p,„(.|x))) 


(B.5) 

(B.6) 


(B.7) 


n 


where: (IB.5I) follows from re-arranging factors and the definition of Paj{-)', (IB.Gh follows 
from [m Theorem 11.1.2]; in ()B.7jl we have applied definitions ()B.2h and ()B.3|) . □ 


Below, for a set A and a distribution q{-), we let q{A) denote J2aeAQio-)- 


Lemma B.2. Let n be a multiple of k, x ^ X and e > 0. Then 

™ /n 

p.jB(")(x,e)|x)>l-2-r^(^^ + lj C 
for some constant C, not depending on n. 


Proof. Let m = n/k. We give a lower bound on the probability of B^'^\x,e) as follows. 




|p) = E nps(!'"(')ip) 

k 


J 2 P^r.iy'" 

= n i: p..(!/"iu = nps(B‘'”'(p..(R).^)K>.8) 

1 (p^ (■|ai),6:) ^—1 

k ^ 

> + = (l-2-”^"(m +1)1^1) (B.9) 

i=l 

= 1 + ^ |^^^(-l)*2-™^(m +1)1^1* > l-2-™^(m + l)^l^l<B.10) 


where: the hrst equality in (|B.8ll follows from the definition of B^'^'){x,e)', the inequality 
(IB.9P follows from [HI Eq. 11.67]; in (IB.lOp . C = k ■ maxj (^) (note that (^) is maximum 
when i = \k/2 ].) □ 
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Lemma B.3. Let x,x' € X, with x ^ Let n > 1. Then there is e > such that 
BW(x,2e) n5W(x',2e) = 0. 

Proof. It is well-known that given any two distinct distributions p{-) and q{-), there is 
e > 0 such that B^'^\p,2e) fl B^'^\q,2£) = 0 (this is a consequence of Pinsker’s inequal¬ 
ity, [Ml Lemma 11.6.1].) Thus, choose e > 0 such that, for some j, {paj{-\x),2s^ n 

Bi^) {paji'\x'), 2e^ = 0: the wanted statement follows from the definition of B^'^\x, 2e) and 
B(^\x',2£). □ 

We are now set to prove Theorem 15.51 


Theorem B.4 (Theorem 15.51) . There is a total, non-adaptive strategy a s.t. L(j{S,p) = 
/(X; [X]). Consequently, 1^,(6,p) = I{X ; [X]). 

Proof. Using the notation previously introduced, we shall prove that, as n —>■ oo 

U{X\Y^) —^ L for some L < U{X\[X]). (B.ll) 

This will imply the thesis, as then L(j{S,p) > I{X', [X]), which, by virtue of Theorem 15.31 
implies Ia{S,p) = I{X][X]). 

Let the equivalence classes of = be ci,..., ck- For each i = 1,..., K, choose a repre¬ 
sentative Xi € Ci of nonzero probability (if it exists; otherwise class c* is just discarded.) We 
can compute as follows. 

u{x\Y^) = 

,(c 

X y'^ 

< \J2p^r^{y'"\x)PaA■\y'")] ( b . i 2 ) 

^ \y" / 

a xsci y j/" 

= iJ2p^r^iy'"\xi)PaA■\y'") 




=iTi-) 


Ci 


(B.13) 


where the inequality in (IB.12P stems from U’s concavity and Jensen’s inequality. We will 
now show that there is a sub-sequence of indices {uj} such that for each i = 1,..., K, 

Qi'i-) —^ P{-\ci) (BT4) 

(according to any chosen metrics in V.) This will imply (|B.11I) : in fact, by virtue of C/’s con¬ 
tinuity, we will have, on the chosen sub-sequence, J2ciPi^i)^i'li^) P(ci)((^(p('|ci)) = 

U{X\[X]). Hence, by virtue of (|B.13I) . on the chosen sub-sequence and hence on every 
sequence, we will have U{X\Y^) —)■ T < U{X\[X]), which is (IB.lip . 
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In order to prove ()B.14ll . take any n > 1 that is a multiple of k, and choose any e > 0 
such that B^'^\x,2e) fl B^'^\x',2e) = 0 whenever x ^ x' (the existence of such an e is 
guaranteed by Lemma EH) Consider a generic x € Ci such that p{x) > 0. We have the 
following lower bound for q^{x). 

PaAy"'\Xi)PaAy''\Xi)p{x) ^ _ Panjy'^lXi) 

^ Ex'P<7niy'"\x')pix') ^ EiBl 


Q?ix) 


> 


> 


E 


^ I V {y”\x')p{x') 

y" 2^x'^Xi p^^{yr^\xi)p{xi) 

PaAy'^lXi) 




yn^B(n)(x,e) p{x) + Ex'^Xi 

PaAy'^lxi) 


o- f Wv” I (■ b))] p{x') 

p{x) 


(B.15) 

(B.16) 


E 


pjcj) 


y'^&B('^){x,e) p{x) + 


x'^X- 


2 - 


p{x) 


Par. (^B^'^\xi,e)\xi^ 


pjcj) 

p{x) 


+ 2-^^C' 


> 




p(c») 

p(x) 


+ 2-^^C' 


(B.17) 

(B.18) 

(B.19) 


where: (jB.lSp follows from the definition of q^{x) and an application of Bayes rule, and from 
the fact that PaEy'^\x) = Pan{y^\xi)'-, (IB.161) follows from a simple union bound and from 
Lemma rB.lt (IB.17P follows from the fact that, by assumption, B^'^\x, 2e) H B^'^\x', 2e) = 0 
(also note that B^'^\x,2£) = B^'^\xi,2£)); (|B.18h follows by definition of B^"'\x,£) = 
B^'^\xi,£)', here C is a suitable constant, not depending on n; (|B.19I) follows from Lemma 

Ea 

Now, let {rij} be a sequence of indices such that, for each x and i, q^ ^ (x) converges to a 
limit, say Li{x) (such a sub-sequence must exist, by Bolzano-Weierstrass.) The inequality 

l_2-f=C(f+ l)'=l^l 


q^{x) > 


pjcj) 

p(x) 


+ 2-^^C" 


which holds for each n that is a multiple of k, implies that these limits satisfy Li{x) > 
Since point-wise convergence for each x implies convergence of q^^ {■) to a probability 
distribution, we have that, for each i and x, actually equality must hold: Lj(x) = 

Thus, for each i = 1,... , iL, g”-’ (•) ^ p(-|c), which proves (IB.1411 . □ 


Appendix C. Backward Induction 

We give some details of the derivation of an optimal strategy of length 2 for the system in 
Example 13.21 We take Shannon entropy as the chosen uncertainty measure. In Figure [U] 
we give a partial representation of the related mdp. According to the Backward Induction 
method, we compute the reward of each node, starting from the leaves of the MDP-tree and 
then propagating them up to the root. Let us denote by R{n) the reward assigned to a 
node n. Applying the algorithm, we compute as follows (recall that levels are counted from 
the root, which has level 0). 

• Level 4. The reward associated to each leaf node n is R{n) = 0 
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Figure 9. A partial representation of the MDP for Example 1 3.2 1 Here, the symbols 
‘z’, ‘d’ and ‘a’ are abbreviations for actions ZIP, Date and Age, respectively. For 
simplicity, probability distributions labelling decision nodes are not shown. More¬ 
over, leaves with the same labels and the same father, and the corresponding incom¬ 
ing arcs, have been coalesced. 


• Level 3. Each probabilistic node, which is the a-child of a decision node h, receives as a 

reward the sum of its immediate gain, and the average reward of its children; 

/2(4) = R{5) = log 5 — I — I log 3, i?(6) = R{8) = 0, R{7) = 5, R{9) = log 3 — |, and so 
on. 

• Level 2. Each decision node receives as a reward the maximum of its children’ rewards 

(and the corresponding action is recorded): R{B) = log 5— | | log 3, R{C) = g, R{D) = 

log 3 — I and so on. 

• Level 1. Each probabilistic node receives the following rewards: i?(l) = log 10 

f log 3 « 2.11, i?(2) =logl0-^-ilog3«i2.27 and i?(3) = log 10 - f - | log3 2.4. 

• Level 0. The decision node A receives the reward R{A) = R{3) ~ 2.4 

Taking into account the maximal children’ rewards selected at each decision node, we have 
the following optimal strategy of length 2 (its tree representation is given in EigureEl) 


A 

cr = 


£ i-A Age 
65 EA ZIP 


29 i-A ZIP 
66 i-A ZIP 


30 !->- Date 
67 !->■ Date 


31 i-A ZIP 
68 !->■ Date 


32 !->- Date 
69 i-A^ Date. 


64 !-)• Date 
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